Compression libraries: FastLZ, LZ4 and Snappy #218

Amxx · 2025-10-09T13:25:51Z

See: OpenZeppelin/openzeppelin-contracts#5820

Summary by CodeRabbit

New Features
- Added FastLZ, LZ4, and Snappy decompression libraries with support for both memory and calldata inputs.
- Improved validation and error handling for malformed data.
- Optimized, gas-conscious implementations for on-chain decompression.
Tests
- Introduced comprehensive round-trip tests for all algorithms using diverse inputs (strings, large text, random bytes).
- Added calldata vs memory coverage and a gas benchmarking suite across algorithms.
Chores
- Added development dependencies for compression tooling (lz4js, snappy) and utilities (solady).

coderabbitai · 2025-10-09T13:26:39Z

Walkthrough

Adds three new Solidity decompression libraries (FastLZ, LZ4, Snappy) with memory and calldata variants, comprehensive JS/Foundry tests including gas benchmarking, a shared test suite, and devDependencies for compression tooling. Implementations are written in memory-safe assembly, include basic validation, and return bytes output or revert on decoding inconsistencies.

Changes

Cohort / File(s)	Summary
Compression Libraries `contracts/utils/compression/FastLZ.sol`, `contracts/utils/compression/LZ4.sol`, `contracts/utils/compression/Snappy.sol`	Introduces internal decompression libraries for FastLZ (level-1), LZ4 (framed), and Snappy; each provides memory and calldata entry points; includes error types for LZ4/Snappy; implementations use memory-safe inline assembly, parse tokens/blocks, manage buffers, and revert on invalid input.
Solidity Test (FastLZ) `test/utils/compression/FastLZ.t.sol`	Adds Foundry test contract validating FastLZ round-trip using Solady’s LibZip for compression; tests memory and calldata paths.
JS Tests per Algorithm `test/utils/compression/FastLZ.test.js`, `test/utils/compression/LZ4.test.js`, `test/utils/compression/Snappy.test.js`	Adds Hardhat tests for each library; compresses inputs with respective JS libs (LibZip/LZ4JS/Snappy), invokes contract methods for memory and calldata, asserts round-trip equality.
Gas Benchmarking `test/utils/compression/gas.test.js`	Adds gas comparison suite across FastLZ, LZ4, Snappy; estimates gas per case and prints aggregate results.
Shared Testsuite `test/utils/compression/testsuite.js`	Introduces common test inputs (strings, lorem ipsum, random buffer) exported as a single array.
Dev Tooling `package.json`	Adds devDependencies: `lz4js`, `snappy`, `solady`; no runtime dependency or scripts changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor T as Test/Caller
    participant F as FastLZ
    participant M as Memory
    participant C as Calldata

    rect rgb(240,248,255)
    note over T,F: FastLZ decompression (level-1)
    T->>F: decompress(input bytes)
    F->>M: allocate output buffer
    loop Parse chunks
        F-->>F: read token (literal or back-ref)
        alt Literal
            F->>M: copy literal bytes
        else Back-reference
            F-->>F: compute length, offset
            F->>M: copy from output window
        end
    end
    F-->>T: output bytes (revert if misaligned)
    end

    rect rgb(245,255,240)
    note over T,F: Calldata variant
    T->>F: decompressCalldata(input calldata)
    F-->>C: calldataload reads
    F->>M: write output
    F-->>T: output bytes
    end

sequenceDiagram
    autonumber
    actor T as Test/Caller
    participant L as LZ4
    participant M as Memory

    rect rgb(248,248,255)
    note over T,L: LZ4 framed decompression
    T->>L: decompress(input)
    L-->>L: validate magic, version, flags
    loop For each block
        alt Uncompressed block
            L->>M: copy block bytes
        else Compressed block
            loop Tokens
                L-->>L: parse token (litLen/matchLen)
                L->>M: copy literals
                L-->>L: read offset
                L->>M: copy match via sliding window
            end
        end
    end
    L-->>T: output (revert on invalid frame/EOF)
    end

sequenceDiagram
    autonumber
    actor T as Test/Caller
    participant S as Snappy
    participant M as Memory

    rect rgb(255,248,240)
    note over T,S: Snappy uncompress
    T->>S: uncompress(input)
    S->>M: precompute output length, allocate
    loop Tokens
        alt Literal
            S->>M: copy literal bytes
        else Copy (1-3)
            S-->>S: decode length+offset
            S->>M: windowed copy
        end
    end
    S-->>T: output (revert on mismatch)
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

I nibbled on bytes with twitchy delight,
Unzipped the streams in moonlit night.
FastLZ hops, LZ4 glides,
Snappy snaps with sliding strides.
Gas prints twinkle, charts align—
Carrots of code, decompress divine.
Thump-thump—merges feel just fine.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the main change by naming the three new compression libraries introduced in this pull request, accurately reflecting the core additions without unnecessary detail.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/decompression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

contracts/utils/compression/Snappy.sol (3)
166-169: Duplicate assignment to len in case 3 (calldata)

Redundant line; remove the duplicate.
-                    len := add(shr(2, c), 1)
-                    len := add(shr(2, c), 1)
+                    len := add(shr(2, c), 1)
97-99: Use the custom error DecodingFailure() instead of bare revert

You declared error DecodingFailure() but revert(0, 0) loses debuggability and consistency. Emit the custom error selector.

Example Yul pattern:

mstore(0x00, DecodingFailure.selector) and revert(0x1c, 0x04)
-            if iszero(and(eq(inputPtr, inputEnd), eq(outputPtr, mload(0x40)))) {
-                revert(0, 0)
-            }
+            if iszero(and(eq(inputPtr, inputEnd), eq(outputPtr, mload(0x40)))) {
+                // revert DecodingFailure()
+                mstore(0x00, /* DecodingFailure.selector */ 0x00000000)
+                revert(0x1c, 0x04)
+            }
Replace 0x00000000 with the actual selector of DecodingFailure(). If you prefer, I can push a patch computing and inlining the selector.

Also applies to: 185-187

38-41: Optional: align free memory pointer and decouple the final check from FMP

Best practice is to 32-byte align the free memory pointer. If you align FMP, compare outputPtr to a saved outputEnd rather than mload(0x40).

Sketch:

After computing outputEnd, set mstore(0x40, and(add(outputEnd, 31), not(31))).

Keep a local let expectedEnd := outputEnd; at the end check eq(outputPtr, expectedEnd) instead of eq(outputPtr, mload(0x40)).

Also applies to: 124-127, 97-99, 185-187

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0ddd27 and e5f790b.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (10)

contracts/utils/compression/FastLZ.sol (1 hunks)
contracts/utils/compression/LZ4.sol (1 hunks)
contracts/utils/compression/Snappy.sol (1 hunks)
package.json (1 hunks)
test/utils/compression/FastLZ.t.sol (1 hunks)
test/utils/compression/FastLZ.test.js (1 hunks)
test/utils/compression/LZ4.test.js (1 hunks)
test/utils/compression/Snappy.test.js (1 hunks)
test/utils/compression/gas.test.js (1 hunks)
test/utils/compression/testsuite.js (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

test/utils/compression/testsuite.js (4)

test/utils/compression/FastLZ.test.js (4)

require (1-1)

require (2-2)

require (3-3)

require (5-5)

test/utils/compression/LZ4.test.js (3)

require (1-1)

require (2-2)

require (3-3)

test/utils/compression/Snappy.test.js (3)

require (1-1)

require (2-2)

require (3-3)

test/utils/compression/gas.test.js (4)

require (1-1)

require (2-2)

require (3-3)

require (7-7)

test/utils/compression/FastLZ.test.js (3)

test/utils/compression/LZ4.test.js (7)

require (1-1)

require (2-2)

require (3-3)

mock (10-10)

raw (22-22)

hex (23-23)

compressed (24-24)

test/utils/compression/gas.test.js (6)

require (1-1)

require (2-2)

require (3-3)

require (7-7)

raw (27-27)

hex (28-28)

test/utils/compression/testsuite.js (1)

require (1-1)

test/utils/compression/gas.test.js (1)

test/utils/compression/testsuite.js (1)

require (1-1)

test/utils/compression/Snappy.test.js (4)

test/utils/compression/FastLZ.test.js (9)

require (1-1)

require (2-2)

require (3-3)

require (5-5)

unittests (7-7)

mock (10-10)

raw (22-22)

hex (23-23)

compressed (24-24)

test/utils/compression/LZ4.test.js (8)

require (1-1)

require (2-2)

require (3-3)

unittests (7-7)

mock (10-10)

raw (22-22)

hex (23-23)

compressed (24-24)

test/utils/compression/gas.test.js (9)

require (1-1)

require (2-2)

require (3-3)

require (7-7)

snappy (6-6)

snappy (14-14)

unittests (9-9)

raw (27-27)

hex (28-28)

test/utils/compression/testsuite.js (1)

require (1-1)

test/utils/compression/LZ4.test.js (4)

test/utils/compression/FastLZ.test.js (9)

require (1-1)

require (2-2)

require (3-3)

require (5-5)

unittests (7-7)

mock (10-10)

raw (22-22)

hex (23-23)

compressed (24-24)

test/utils/compression/Snappy.test.js (8)

require (1-1)

require (2-2)

require (3-3)

unittests (7-7)

mock (10-10)

raw (22-22)

hex (23-23)

compressed (24-24)

test/utils/compression/gas.test.js (8)

require (1-1)

require (2-2)

require (3-3)

require (7-7)

lz4js (5-5)

unittests (9-9)

raw (27-27)

hex (28-28)

test/utils/compression/testsuite.js (1)

require (1-1)

🪛 GitHub Check: codespell

test/utils/compression/testsuite.js

[failure] 28-28:
varius ==> various

[failure] 27-27:
Nam ==> Name

[failure] 25-25:
varius ==> various

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: slither
GitHub Check: tests
GitHub Check: coverage

coderabbitai · 2025-10-09T13:40:16Z

contracts/utils/compression/FastLZ.sol

+                case 7 {
+                    let len := add(9, byte(1, chunk))
+                    for {
+                        let i := 0
+                        let ofs := add(add(shl(8, and(first, 31)), byte(2, chunk)), 1)
+                        let ref := sub(outputPtr, ofs)
+                        let step := xor(len, mul(lt(ofs, len), xor(ofs, len)))
+                    } lt(i, len) {
+                        i := add(i, step)
+                    } {
+                        mcopy(add(outputPtr, i), add(ref, i), step)
+                    }
+                    inputPtr := add(inputPtr, 3)
+                    outputPtr := add(outputPtr, len)
+                }


⚠️ Potential issue | 🔴 Critical

Fix FastLZ type‑7 byte ordering.

For type‑7 matches the format is ctrl | offsetLow | lengthExtra. The code currently reads byte(1, chunk) as the length byte and byte(2, chunk) as the offset byte, then advances inputPtr by 3. This swaps the order: valid streams produced by the reference encoder/LibZip place the offset immediately after ctrl, so we subtract the wrong offset (using the length byte) and read the real offset as the next chunk’s first byte. Result: long matches (>=9 bytes) decode garbage or revert when the pointer check at Line 68 trips.

Reorder the reads so we pull the offset from byte(1, chunk), the length extension from byte(2, chunk), and only consume the extra length byte when we actually use it. One possible fix:

- case 7 { - let len := add(9, byte(1, chunk)) - for { - let i := 0 - let ofs := add(add(shl(8, and(first, 31)), byte(2, chunk)), 1) + case 7 { + let extra := byte(2, chunk) + let ofs := add(add(shl(8, and(first, 31)), byte(1, chunk)), 1) + let len := add(9, extra) + for { + let i := 0 let ref := sub(outputPtr, ofs) let step := xor(len, mul(lt(ofs, len), xor(ofs, len))) } lt(i, len) { i := add(i, step) } { mcopy(add(outputPtr, i), add(ref, i), step) } - inputPtr := add(inputPtr, 3) + inputPtr := add(inputPtr, 3)

(and mirror the change in the calldata loop).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

case 7 {

let len := add(9, byte(1, chunk))

for {

let i := 0

let ofs := add(add(shl(8, and(first, 31)), byte(2, chunk)), 1)

let ref := sub(outputPtr, ofs)

let step := xor(len, mul(lt(ofs, len), xor(ofs, len)))

} lt(i, len) {

i := add(i, step)

} {

mcopy(add(outputPtr, i), add(ref, i), step)

}

inputPtr := add(inputPtr, 3)

outputPtr := add(outputPtr, len)

}

case 7 {

// pull offset low first, then length extension

let extra := byte(2, chunk)

let ofs := add(add(shl(8, and(first, 31)), byte(1, chunk)), 1)

let len := add(9, extra)

for {

let i := 0

let ref := sub(outputPtr, ofs)

let step:= xor(len, mul(lt(ofs, len), xor(ofs, len)))

} lt(i, len) {

i := add(i, step)

} {

mcopy(add(outputPtr, i), add(ref, i), step)

}

inputPtr := add(inputPtr, 3)

outputPtr := add(outputPtr, len)

}

🤖 Prompt for AI Agents

In contracts/utils/compression/FastLZ.sol around lines 37 to 51, the type-7 branch reads the bytes in the wrong order (using byte(1, chunk) as the length and byte(2, chunk) as the offset) and always advances inputPtr by 3; update it to read the offset from byte(1, chunk) and the length-extension from byte(2, chunk), compute ofs using the correct offset byte, only advance inputPtr by 2 normally and by 3 when the extra length byte is actually consumed, and mirror the same byte-order and pointer-advance fix in the calldata decoding loop so offsets and lengths match the reference encoder/LibZip format.

coderabbitai · 2025-10-09T13:40:17Z